This , Ames Housing dataset was compiled by Dean De Cock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges to predict the final price of each home.
For download the dataset and more info visit House Prices dataset in Kaggle
df
test
1- lowercase the names of the columns
df
test
2- Find the nulls in datasets :
3- Use fallin() to full the nulls in the data:
- Check nulls for trainset
df.isnull().sum()
- Check nulls for testset
test.isnull().sum()
Saleprice with :
MSSubClass: Identifies the type of dwelling involved in the sale.LotFrontage: Linear feet of street connected to property LotArea: Lot size in square feetOverallQual: Rates the overall material and finish of the house
Saleprice with :
OverallCond: Rates the overall condition of the houseYearBuilt: Original construction dateGarageCars: Size of garage in car capacityGarageArea: Size of garage in square feet
Saleprice with :
GrLivArea : Above grade (ground) living area square feet1stFlrSF : First Floor square feetFullBath : Full bathrooms above gradeTotRmsAbvGrd : Total rooms above grade (does not include bathrooms)
Saleprice with :
BsmtCond: Evaluates the general condition of the basement
Saleprice with :
BsmtFinType1: Rating of basement finished area
Saleprice with :
YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
Saleprice with:
TotalBsmtSF: Total square feet of basement area
TotalBsmtSF: Total square feet of basement area MasVnrArea: Masonry veneer area in square feet
Saleprice SaleType: Type of sale
SalepricePavedDrive: Paved driveway
| column name | accptence as predicter | reason |
|---|---|---|
| MSSubClass | F | have good info but bad corr with price |
| LotFrontage | T | good info good corr with price |
| LotArea | T | good info good corr with price |
| verallQual | F | scale of rating carry many aspect- not a sold info - can't rly on it |
| OverallCond | F | scale of rating carry many aspect- not a sold info - can't rly on it |
| YearBuilt | T | good info good corr with price |
| GarageArea | T | good info good corr with price |
| GarageCar | T | good info good corr with price |
| GrLivArea | T | good info good corr with price |
| 1stFlrSF | T | good info good corr with price |
| FullBath | T | good info good corr with price |
| TotRmsAbvGrd | T | good info good corr with price |
| GarageYrBlt | T | good info good corr with price |
| Fireplaces | T | good info good corr with price |
| 2ndFlrSF | T | good info good corr with price |
| HalfBath | T | good info good corr with price |
We will deal with our categorical variables by dummy_variables techniqe
StanderdScalerUsing
Lasso
Using
RandomForestRegressor
LassowithGridSearchCV